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ARRAY SEARJ2#ING OPERATIONS 




BACKGROUND 

This invention relates to array searching operations 
5 for a computer. 

Many conventional programmable processors, such as 
digital signal processors (DSP) , support a rich instruction 
set that includes numerous instructions for manipulating 
arrays of data. These operations are typically 
10 computationally intensive and can require significant 
computing time, depending upon the number of execution 
units, such as multiply-accumulate units (MACs) , within the 
processor . 





15 \ DESCRIPTION OF DRAWINGS 

Figure 1 is a) block diagram illustrating an example of 
pipelined programmable processor according to the 
invention. 

Figure 2 is a block diagram illustrating an example 
20 execution pipeline for the programmable processor. 

Figure 3 is a flowchart for implementing an example 



or* 



25 



array manipulation machine instruction according to the 
/ invention. j 

Figure 4 is a flowchart of an example routine for 
invoking the machine instruction. 
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DESCRIPTION 



' Figure 1 is a block diagram illustrating a 
programmable processor 2 having an execution pipeline 4 and 
a control unit 6. Processor 2, as explained in detail 
below, reduces the computational time required by array 
manipulation operations. In particular, processor 2 may 
support a machine instruction, referred to herein as the 
SEARCH instruction, that reduces the computational time to 
search an array of numbers in a. pipelined processing 
environment . 

Pipeline 4 has a number of stages for processing 
instructions. Each stage processes concurrently with the 
other stages and passes results to the next stage in 
pipeline 4 at each clock cycle. The final results of each 
instruction emerge at the end of the pipeline in rapid 
succession. 

Control unit 6 controls the flow of instructions and 
data through the various stages of pipeline 4. During the 
processing of an instruction, for example, control unit 6 
directs the various components of the pipelined to fetch 
and decode the instruction, perform the corresponding 
operation and write the results back to memory or local 



Figure 2 illustrates an example pipeline 4 configured 
according to the invention. Pipeline 4, for example, has 
five stages: instruction fetch (IF), decode (DEC), address 
calculation (AC) , execute (EX) and write back (WB) . 



registers . 
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Instructions are fetched from memory, or from an 
instruction cache, during the IF stage by fetch unit 21 and 
decoded within address registers 22 during the DEC stage. 
At the next clock cycle, the results pass to the AC stage, 
5 where data address generators 23 calculate any memory 
addresses that are necessary to perform the operation. 

During the EX stage, execution units 2 5A through 25M 
perform the specified operation such as, for example, 
adding or multiplying numbers, in parallel. Execution 

10 units 25 may contain specialized hardware for performing 
the operations including, for example, one or more 
arithmetic logic units (ALU's), floating-point units (FPU) 
and barrel shifters. A variety of data can be applied to 
execution units 2 5 such as the addresses generated by data 

15 address generator 23, data retrieved from data memory 18 or 
data retrieved from data registers 24. During the final 
stage (WB) , the results are written back to data memory or 
to data registers 24. 

The SEARCH instruction supported by processor 2, may 

20 allow software applications to search an array of N data 

elements by issuing N/M search instructions, where M is the 
number of data elements that can be processed in parallel 
by execution units 25 of pipeline 4. Note, however, that a 
single execution unit may be capable of executing two or 

25 more operations in parallel. For example, an execution 
unit may include a 32 -bit ALU capable of concurrently 
comparing two 16 -bit numbers. 
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Generally, the sequence of SEARCH instructions allow 



the processor to process 
identify an ""extreme val 
minimum, for each set. 
instructions , processor 
of the extreme value of 



identify either the fir 
or minimum value. Furt 



M sets of elements in parallel to 

Lie 1 ' , such as a maximum or a 

During the execution of the search 

stores references to the location 

ach of the M sets of elements. 

Upon completion of the Nl/M instructions, as described in 

detail below, the software application analyzes the 

references to the extreme values for each set to quickly 

identify an extreme value for the array. For example, the 

instruction allows the software applications to quickly 

3t or last occurrence of a maximum 

hermore, as explained in detail 

below, processor 2 implements the operation in a fashion 

suitable for vectorizirg in a pipelined processor across 

the M execution units 25. 

\ f 

As described above, a software application searches an 
array of data by issuing N/M SEARCH machine instructions to 

a flowchart illustrating an 
2 0 for processor 2 when it 
receives a single SEARCH rrachine instruction. Process 2 0 
is described with reference to identifying the last 
occurrence of a minimum value within the array of elements; 
however, process 20 can be easily modified to perform other 
functions such as identifying the first occurrence of a 
minimum value, the first occurrence of a maximum value or a 
last occurrence of a maximurti value. 



processor 2 . Figure 3 is 
example mode of operation 
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For exemplary purposes, process 2 0 is described in 



assuming M equals 2, i 
processes two sets of 
elements. However, the 
is readily extensible 



suited for a pipelined 
execution units in the 



s., processor 2 concurrently 
lements, each set having N/2 
process is not limited as such and 
do concurrently process more than two 
sets of elements. In general, process 20 facilitates 
vectorization of the search process by fetching pairs of 
elements as a single data quantity and processing the 
element pairs through pipeline 4 in parallel, thereby 
reducing the total nurroer of clock cycles necessary to 
identify the minimum value within the array. Although 
applicable to other architectures, process 20 is well 

processor 2 having multiple 
EX stage. For each set of elements, 
process 20 maintains tlwo pointer registers, and P odd , 

that store locations f|or the current extreme value within 

In addition, process 20 maintains 
nd Al,that hold the current extreme 
Che pointer registers and the 
may readily be implemented as 
general -purpose data registers without departing from 
process 3 0 

Referring to Figuire 3, in response to each SEARCH 
instruction, processor!2 fetches a pair of elements in one 
clock cycle as a single data quantity (21) . For example, 
processor 2 may fetch two adjacent 16 -bit values as one 32- 
bit quantity. Next, processor 2 compares the even element 



the corresponding set 
two accumulators, AO a 
values for the sets, 
accumulators , however, 
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of the pair to a current ninimum value for the even 
elements (22) and the odd 



element of the pair to a current 
elements (24) . 
When a new minimum value for the even elements is 
detected, processor 2 updates accumulator AO to hold the 
new minimum value and updates a pointer register P Even to 
hold a pointer to corresponding data quantity within the 
array (23) . Similarly, when a new minimum value for the 



odd elements has been dete 
accumulator Al and a points 
example, each pointer regi 
data quantity and not the 
process is not limited as 



ted, processor 2 updates 
r register P odd (25) . In this 
s|ter and P odd points to the 

individual elements, although the 
such. Processor 2 repeats the 



process until all of the elements within the array have 
been processed (26) . Becai.se processor 2 is pipelined, 
element pairs may be fetched until the array is processed. 

The following illustrates exemplary syntax for 
invoking the machine instruction: 

( P Odd/ P even) = SEARCH R Data LE, R Data = [ P fetch_addr + + ] 

Data register R Data is used as a scratch register to 
store each newly fetched data element pair, with the least 
significant word of R Data holding the odd element and the 
most significant word of R Data holding the even element. 
Two accumulators, AO and Al, are implicitly used to store 
the actual values of the results. An additional register, 



fetch addr 



, is incremented when the SEARCH instruction is 



issued and is used as a pointer to iterate over the N/2 
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data quantities within the array. The defined condition, 
such as ""less than or equal 11 (LE) in the above example, 
controls which comparison is executed and when the pointer 
registers and P odd/ as well as the accumulators AO and 

Al, are updated. The ""LE 11 , for example, directs 
processor 2 to identify the last occurrence of the minimum 
value . 

In a typical application, a programmer develops a 
software application or subroutine that issues the N/M 
search instructions, probably from within a loop construct. 
The programmer may write the software application in 
assembly language or in a high-level software language. A 
compiler is typically invoked to processes the high-level 
software application and generate the appropriate machine 



15 




instructions for processor 
instructions for searching^ 



20 



25 



2, including the SEARCH machine 
the array of data. 
Figure 4 is a flowchart of an example software routine 
3 0 for invoking the example machine instructions 
illustrated above. First, /the software routine 30 
initializes the registers including initializing AO and Al 
and pointing P^ and P odd to! the first data quantity within 
the array (31) . In one embodiment, software routine 3 0 
initializes a loop count register with the number of SEARCH 
instructions to issue (N/M) . Next, routine 3 0 issues the 
SEARCH machine instructiorl N/M times. This can be 
accomplished a number of ways, such as by invoking a 
hardware loop construct supported by processor 2. Often, 
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however, a compiler may unroll a software loop into a 
sequence of identical SEJUCH instructions (32) . 

After issuing N/M search instructions, AO and Al hold 
the last occurrence of the minimum even value and the last 
occurrence of the minimum odd value, respectively. 
Furthermore, P^ and P odd store the locations of the two 
data quantities that hold the last occurrence of the 
minimum even value and the last occurrence of the minimum 
odd value . 



Next, in order to ident 
minimum value for the entire 
increments P odd by a single e'. 
directly at the minimum odd 
compares the accumulators AO 
the accumulators contain the 
minimum of the odd elements 
elements (34) . If so, the r 



to determine whether P odd is 



ify the last occurrence of the 
array, routine 30 first 
ement, such that P odd points 
element (33) . Routine 3 0 
and Al to. determine whether 
same value, i.e., whether the 
equals the minimum of the even 
outine 3 0 compares the pointers 
ess than P^^ and, therefore, 



■ Odd 



and P Even whether the minimum even value occurred earlier 



in the array (35) . Based on the comparison, the routine 
determines whether to copy P odd into P^^ (3 7) 

When the accumulators AO and Al are not the same, the 
routine compares AO to Al in order to determine which holds 
the minimum value (36) . If Al is less than AO then routine 
3 0 sets P^ equal to P odd , thereby copying the pointer to 
the minimum value from P odd into P Even (37) . 
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At this point, P,^ points to the last occurrence of 
the minimum value for the entire array. Next, routine 3 0 
adjusts Pg^ to compensate for errors introduced to the 
pipelined architecture of processor 2 (38) . For example, 
the comparisons described. above are typically performed in 
the EX stage of pipeline 4 while incrementing the pointer 
register P fetch _ addr typically occurs during the AC stage, 
thereby causing the P odd and P^ to be incorrect by a known 
quantity. After adjusting P^^, routine 3 0 returns P^^ as 
a pointer to the last occurrence of the minimum value 
within the array (39) . 



Figure 5 illustrates t 
instruction as generalized 
capable of processing M el$L 
such as when processor 2 ir 
SEARCH instruction causes j 
in a single fetch cycle (5] 



lie operation for a single SEARCH 
:o the case where processor 2 is 
ents of the array in parallel, 
eludes M execution units. The 
rocessor 2 to fetch M elements 
) . Furthermore, in this 



example, processor 2 maintains M pointer registers to store 
addresses (locations) of a corresponding extreme value for 
each of the M sets of elements. After fetching the M 
elements, processor 2 concurrently compares the M elements 
to a current extreme value Jfor the respective element set, 
as stored in M accumulators (52) . Based on the 
comparisons, processor 2 updates the M accumulators and the 



M pointer registers (53) . 

Figure 6 illustrates t 
application issues N/M SEAR 



\ 

le general case where a software 
H instructions and, upon 
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completion of the instructs 
value for the entire array 
application initializes a 1 



ons, determines the extreme 

First, the software 
oop counter, the M accumulators 



used to store the current extreme values for the M element 



sets and the M pointers use 
extreme values (61) . Next, 
N/M SEARCH instructions (62 



the M element sets to ident 



to store the locations of the 
the software application issues 
After completion of the 



instructions, the software application may adjust the M 
pointer registers to correct ly reference its respective 
extreme value, instead of ttje data quantity holding the 
extreme value (63) . After aldjusting the pointer registers, 
the software application compares the M extreme values for 

_Ey an extreme value for the 



li n 



entire array, i.e., a maximuti value or a minimum value 
(64) . Then, the software application may use the pointer 
registers to determine whether more than one of the element 
sets have an extreme value edual to the array extreme value 
and, if so, determine which extreme value occurred first, 
or last, depending upon the desired search function (65) . 

Various embodiments of the invention have been 
described. For example, a single machine instruction has 
been described that searches an array of data in a manner 
that facilitates vectorization of the search process within 
a pipelined processor. The processor may be implemented in 
a variety of systems including general purpose computing 
systems, digital processing systems, laptop computers, 
personal digital assistants (PDA's) and cellular phones. 
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For example, cellular phones often maintain an array of 
values representing signal strength for services available 
3 60° around the phone. In this context, the process 
discussed above can be readily used upon initialization of 
the cellular phone to scan the available services and 
quickly select the best service. In such a system, the 
processor may be coupled to a memory device, such as a 
FLASH memory device or a static random access memory 
(SRAM) , that stores an operating system and other software 
applications. These and other embodiments are within the 
scope of the following claims. 
What is claimed is: 



